fdaPDE (Palummo et al.,
2025) is a C++ library with an interface to R for
physics-informed spatial and functional data analysis, at the
intersection of statistics and numerical analysis. The library provides
advanced statistical methods designed for data located over complex
spatial domains, ranging from irregular planar regions and curved
surfaces to linear networks and volumes, possibly evolving over time.
The class of methods implemented in fdaPDE features
regularization terms based on Partial Differential Equations (PDEs),
which allow incorporating information derived from the physics of the
problem under study into the statistical modeling. This makes
fdaPDE an extremely flexible tool for the analysis of
complex data. For a review of this class of methods, refer to Sangalli (2021).
fdaPDE offers a wide range of modeling capabilities –
including regression, nonparametric density estimation, functional data
analysis, and more – for data located over a spatial domain, possibly
evolving over time. Among the broad range of modeling capabilities
offered by fdaPDE, we focus here on the quantile
spatial regression method.
Regression analysis is a widely used tool for studying how a response variable behaves conditionally on a set of covariates. However, when data substantially deviate from the Gaussian assumption – a scenario frequently encountered in real-world applications – the conditional mean may be uninformative and provide limited insight. Quantile regression models overcome this limitation by offering a flexible framework to analyze the entire conditional distribution, particularly the tail characteristics of the response variable.
To estimate spatial quantile fields for complex spatial phenomena, we refer to the method introduced in Castiglione et al. (2025). The proposed model is particularly effective when the data exhibit skewness, heteroscedasticity, or extreme values – especially when these characteristics vary across space due to physical factors such as wind streams or water currents. The method combines the quantile regression framework with spatial smoothing driven by Partial Differential Equations (PDEs).
Given a real-valued random variable \(Y\,,\) we consider \(n\) realizations \(\left\{y_i\right\}_{i=1}^n \subset \mathbb{R}\,,\) each observed at specific spatial locations \(\left\{\mathbf{p}_i\right\}_{i=1}^n \subset \mathcal{D}\,,\) where \(\mathcal{D} \subset \mathbb{R}^d\,,\) for \(d\geq1\,,\) denotes the spatial domain under consideration. Our aim is to estimate the corresponding unknown \(\alpha-\)quantile function \(f_\alpha : \mathcal{D} \to \mathbb{R}\,,\) for any probability level \(\alpha \in (0,1)\,.\) The nonparametric spatial regression model for the \(\alpha-\)quantile reads as: \[Q_{Y_i|\mathbf{p}_i}(\alpha) = f_\alpha(\mathbf{p}_i)\,, \quad \quad i=1,...,n\,. \] We estimate the unknown quantile spatial field \(f_\alpha\) by minimizing the penalized loss functional: \[ J_\alpha(f) = \frac{1}{n} \sum_{i=1}^n \rho_{\alpha}(y_i - f(\mathbf{p}_i) ) \ + \lambda \int_\mathcal{D} (Lf(\mathbf{p}) - u(\mathbf{p}))^2 \, d\mathbf{p} \,,\]
which balances a data fidelity term, expressed by the pinball loss function \(\rho_\alpha(t) = \frac{1}{2}|t|-(\frac{1}{2}-\alpha)t\,,\) with a regularization term based on a problem-specific PDE. The trade-off between these two terms is controlled by the smoothing parameter \(\lambda\,.\) The optimal value of the smoothing parameter is selected via Generalized Cross-Validation (GCV) minimization.
Estimation is performed using the Expectation-Maximization (EM) algorithm. Then, the resulting infinite-dimensional solution is discretized through Finite Element Method (FEM) over a computational mesh that approximates the spatial domain \(\mathcal{D}\,.\) FEM ensures computational efficiency and enables accurate analysis of data scattered over irregularly shaped domains.
The considered modeling framework can be easily generalized to include spatial covariates, thus obtaining a semiparametric version of the previous model. Let \(\left\{\mathbf{x}_i\right\}_{i=1}^n \subset \mathbb{R}^q\) be a vector of \(q\) spatial covariates measured at the locations \(\left\{\mathbf{p}_i\right\}_{i=1}^n\,.\) The semiparametric spatial regression model for the \(\alpha-\)quantile reads as: \[Q_{Y_i|\mathbf{x}_i, \mathbf{p}_i}(\alpha) = \mathbf{x}_i^\top \boldsymbol{\beta}_\alpha + f_\alpha(\mathbf{p}_i)\,, \quad \quad i=1,...,n\,.\]
We can now jointly estimate the unknown vector of coefficients \(\boldsymbol{\beta}_\alpha\) and the quantile spatial field \(f_\alpha\,.\) To do so, we include the parametric term into the data fidelity criterion of the penalized loss functional as follows: \[ J_\alpha(f) = \frac{1}{n} \sum_{i=1}^n \rho_{\alpha}(y_i - \mathbf{x}_i^\top \boldsymbol{\beta}- f(\mathbf{p}_i) ) \ + \lambda \int_\mathcal{D} (Lf(\mathbf{p}) - u(\mathbf{p}))^2 \, d\mathbf{p}\,.\]
Finally, the estimation and discretization procedures follow the same steps as in the nonparametric model.
The method for Quantile Spatial Regression with Partial Differential
Equation regularization presented in this vignette, hereafter referred
to as QSR-PDE, is implemented within the newest version
fdaPDE C++ library (Palummo et al.,
2025). We load the library in the working environment as
follows:
As a benchmark application in enviornmental sciences, we analyze the Switzerland rainfall dataset, which collects \(462\) rainfall measurements (in \(10 \, \mu m\) units) recorded on May 8, 1986. This dataset is well known in the statistical community, with several studies dedicated to its analysis, including Dubois et al. (2003) and Dubois et al. (2007). Starting from this set of georeferenced data, the goal is to estimate spatial quantile maps of the rainfall distribution. To compare the median tendency of the phenomenon with tail behavior, we focus on the estimates for both \(50\%\) and \(90\%\) quantile fields, assessing whether some regions are more prone to experience extreme values and to identify potential spatial patterns associated with high-risk events.
First, we load the boundary nodes of our domain of interest, which are stored sequentially in a clockwise order. We then construct the matrix of boundary segments, with each row containing the indices of the boundary nodes that define a segment.
## [SPATIAL DOMAIN]
# Load the boundary nodes
boundary_nodes <- read.table(file = "../data/QSRPDE_2D/QSRPDE_2D_boundary_nodes.txt",
header = TRUE)
# Create boundary segments (consecutive boundary nodes are connected)
boundary_segments <- cbind(1:(nrow(boundary_nodes)-1), 2:nrow(boundary_nodes))
# Close the loop (connect the last node to the first)
boundary_segments <- rbind(boundary_segments, c(nrow(boundary_nodes), 1))The domain can be interactively visualized using the
mapview (Appelhans et al.,
2023) R package, once boundary nodes are converted into a
sf (Pebesma, 2025)
object.
# Convert boundary into a sf object
boundary_nodes_sf <- st_as_sf(x = rbind(boundary_nodes, boundary_nodes[1,]),
coords = c("lon", "lat"), crs = 4326)
boundary_polygon <- st_polygon(list(st_coordinates(boundary_nodes_sf)))
boundary_sfc <- st_sfc(geometry = boundary_polygon, crs = 4326)
boundary_sf <- st_sf(geometry = boundary_sfc)
# Interactive plot
mapview(boundary_sf,
col.regions = "gray25", alpha.regions = 0.25, col = "black", lwd = 1.5,
legend = FALSE, layer.name = "domain")Figure 1: Spatial domain of interest – Switzerland.
Now we build a regular mesh of the spatial domain
under consideration. We create a mesh based on boundary nodes and
segments, and then we refine it by setting the maximum element area to
\(0.0045\) and the minimum angle to 30
degrees. The triangulation is constructed using the
RTriangle (Shewchuk & Sterratt,
2025) R package. Other software can be used to get the
triangulation; for example, it can be constructed through the
fmesher (Lindgren, 2025) R
package.
# Define a planar straight-line graph object
p <- pslg(P = boundary_nodes, S = boundary_segments)
# Create a regular mesh of the spatial domain
mesh <- triangulate(p = p, Y = FALSE, D = TRUE)
if (is.null(mesh$H)) mesh$H <- matrix(numeric(0), ncol = 2)
# Refine the mesh of the spatial domain
mesh <- triangulate(mesh, a = 0.0045, q = 30, D = TRUE)We convert the triangulation object from RTriangle so
that it can be read by fdaPDE.
# Set up the triangulation for fdaPDE
mesh = triangulation(nodes = mesh$P, cells = mesh$T, boundary = mesh$PB)
# Nodes coordinates
head(mesh$nodes)
#> [,1] [,2]
#> [1,] 9.658689 47.13001
#> [2,] 9.863359 47.10578
#> [3,] 9.971987 47.01739
#> [4,] 10.090422 46.95084
#> [5,] 10.201739 46.95443
#> [6,] 10.301752 47.03821
# Number of nodes
mesh$n_nodes
#> [1] 1128
# Edges
head(mesh$edges)
#> [,1] [,2]
#> [1,] 51 1065
#> [2,] 51 1064
#> [3,] 1064 1065
#> [4,] 112 255
#> [5,] 112 137
#> [6,] 137 255
# Number of edges
mesh$n_edges
#> [1] 3204
# Triangles
head(mesh$cells)
#> [,1] [,2] [,3]
#> [1,] 51 1065 1064
#> [2,] 112 255 137
#> [3,] 969 982 971
#> [4,] 97 92 102
#> [5,] 49 48 342
#> [6,] 1061 86 1062
# Number of triangles
mesh$n_cells
#> [1] 2077
# Bounding box
mesh$bbox
#> [,1] [,2]
#> [1,] 5.909523 45.79278
#> [2,] 10.584031 47.83220We visualize the resulting mesh of the spatial domain of interest.
# Interactive plot
mapview(boundary_sf,
col.regions = "gray25", alpha.regions = 0.25, col = "black", lwd = 1.5,
legend = FALSE, layer = "domain") +
mapview(mesh, crs = 4326,
col.regions = "transparent", lwd = 1.25, legend = FALSE, layer = "mesh")Figure 2: Regular mesh of the Switzerland domain with \(1\,128\) nodes and \(2\,077\) triangles.
We use the triangulation just created to define the spatial support
of a geoframe object. This object will host layers
containing data observed over the spatial support.
We import the preprocessed data from the file
../data/QSRPDE_2D/QSRPDE_2D_data.txt as a
data.frame object. We add a data layer to the geoframe
object defined above, thus obtaining a format compatible with the
implementation of the proposed regression method.
## [DATA]
# Load the data
data <- read.table(file = "../data/QSRPDE_2D/QSRPDE_2D_data.txt", header = TRUE)
head(data)
#> lon lat rainfall log.rainfall
#> 1 8.585260 47.77857 1.84 0.6097656
#> 2 8.637467 47.74956 1.21 0.1906204
#> 3 8.443961 47.74597 1.38 0.3220835
#> 4 8.760957 47.72186 1.26 0.2311117
#> 5 8.425719 47.69566 1.56 0.4446858
#> 6 8.701118 47.69077 1.00 0.0000000
# Add layer with data to the geoframe object
switzerland$insert(layer = "rainfall", type = "point", geo = c("lon", "lat"),
data = data)
switzerland
#> Geoframe with 1 layers
#> Bounding box: xmin: 5.909523 ymin: 45.79278 xmax: 10.58403 ymax: 47.8322
#> Number of nodes: 1128
#> Number of cells: 2077
#>
#> Layer: rainfall
#> Type: POINT
#> Dims: 462, 2
#> First 6 data rows:
#> rainfall log.rainfall
#> [0;31m<flt64> [0m[0;31m<flt64> [0m
#> 1.84 0.609766
#> 1.21 0.19062
#> 1.38 0.322083
#> 1.26 0.231112
#> 1.56 0.444686
#> 1 0
# Variable names
names(switzerland[["rainfall"]])
#> [1] "rainfall" "log.rainfall"
# Number of variables
ncol(switzerland[["rainfall"]])
#> [1] 2
# Locations of measurement stations (lon, lat)
head(gf_locations(switzerland[["rainfall"]]))
#> [,1] [,2]
#> [1,] 8.585260 47.77857
#> [2,] 8.637467 47.74956
#> [3,] 8.443961 47.74597
#> [4,] 8.760957 47.72186
#> [5,] 8.425719 47.69566
#> [6,] 8.701118 47.69077To visualize the rainfall distribution, we plot the data across the Switzerland region in both the original and logarithmic scales.
map1 <- mapview(switzerland[["rainfall"]], varnames = "rainfall", crs = 4326,
color_palettes = list("mako"), na.color = "transparent",
layer.name = "rainfall") +
mapview(boundary_sf, col.regions = "transparent", alpha.regions = 0,
col = "black", lwd = 1.5, layer.name = "domain", legend = FALSE)
map2 <- mapview(switzerland[["rainfall"]], crs = 4326, varnames = "log.rainfall",
color_palettes = list("mako"), na.color = "transparent",
layer.name = "log.rainfall") +
mapview(boundary_sf, col.regions = "transparent", alpha.regions = 0,
col = "black", lwd = 1.5, layer.name = "domain", legend = FALSE)
sync(map1, map2)Figure 3: Switzerland rainfall data collected at \(462\) locations, shown in the original scale (left) and in logarithmic scale (right).
The data exhibit strong anisotropic patterns across the domain. To account for the clear anisotropy observed in the data, we incorporate a stationary anisotropic diffusion term into the model. To do so, we define the differential operator as \[Lf(\mathbf{p}) = \text{div}(\mathbf{K}\nabla f(\mathbf{p})) \quad \forall \mathbf{p} \in \mathcal{D}\,.\] The anisotropy tensor \(\mathbf{K}\) is estimated using the parameter cascading algorithm presented in Bernardi et al. (2018), yielding an orientation angle of \(140^\circ (2.45 \ \text{rad})\) and an intensity of anisotropy of \(6.00\,.\)
## [DIFFUSION TENSOR]
K <- matrix(
c(1.229618413471819, 1.001009926596135, 1.001009926596135, 1.628164356689574),
nrow = 2, ncol = 2, byrow = TRUE
)We can now compute the physics-informed quantile spatial estimate,
for the \(50\%\) probability level. To
this end, we use the qsr function, setting the quantile
level with the option level = 0.5 and providing the
anistropic diffusion tensor with fe_elliptic(K = K) as a
penalty descriptor. The actual fitting is obtained by calling the
fit method of the class qsr. The optimal
smoothing parameter \(\lambda\) is
selected via Generalized Cross-Validation (GCV) minimization using grid
search method.
## [PHYSICS-INFORMED QUANTILE REGRESSION]
# Set up the finite element function (order 1)
f <- fe_function(mesh, type = "P1")
# Proposed value for the smoothing parameter
lambda_grid = 10^seq(from = -7, to = -3, by = 0.2)
# Physics informed smoothing model
m <- qsr(log.rainfall ~ f, data = switzerland, level = 0.5,
penalty = fe_elliptic(K = K)
)
# Physics informed smoothing fit with grid search for GCV minimization
fit = m$fit(
calibrator = gcv(optimizer = grid_search(grid = lambda_grid),
seed = 425)
)Moreover, it is possible to inspect the behavior of the GCV indices as a function of the values proposed for the smoothing parameter \(\lambda\,.\)
# GCV indices
gcv <- fit$values
# Optimal value selected for the smoothing parameter
lambda_opt_grid <- fit$optimum
lambda_opt_grid
#> [1] 1e-05
# Plot of the GCV curve
par(family = "serif")
plot(x = log10(lambda_grid), y = gcv, type = "b",
lwd = 2, xlab = TeX("$\\log_{10}(\\lambda)$"), ylab = "GCV")
grid()
abline(v = log10(lambda_opt_grid), lty = 2, lwd = 2, col = "royalblue")
legend("bottomright", lty = 2, lwd = 2, col = "royalblue",
legend = TeX("$\\log_{10}(\\lambda_{grid})"))Figure 4: GCV curve for the \(50\%\) quantile field.
The GCV curve is convex with minimum realized at the optimal value selected by the method – specifically \(1e-5\,.\).
We visualize the \(50\%\) quantile
estimate computed above in the logarithmic scale using
mapview. In addition, we visualize the \(50\%\) quantile estimate in the original
scale, applying the fit function to the
fe_function object.
# Interactive plot
map_log50 <- mapview(f, crs = 4326, col.regions = mako,
na.color = "transparent", layer.name = "log.rainfall") +
mapview(boundary_sf, col.regions = "transparent", alpha.regions = 0,
col = "black", lwd = 1.5, layer.name = "domain", legend = FALSE)
map_50 <- mapview(exp(f), crs = 4326, col.regions = mako,
na.color = "transparent", layer.name = "rainfall") +
mapview(boundary_sf, col.regions = "transparent", alpha.regions = 0,
col = "black", lwd = 1.5, layer.name = "domain", legend = FALSE)
sync(map_log50, map_50)Figure 5: Rainfall \(50\%\) quantile
estimate in the logarithmic scale computed by the physics-informed
QSR-PDE with optimal lambda \(1e-05\) selected via GCV minimization using
grid search method (left); rainfall \(50\%\) quantile estimate in the original
scale computed by the physics-informed QSR-PDE with optimal
lambda \(1e-05\) selected via GCV
minimization using grid search method (right).
We repeat the analysis for the quantile level \(\alpha = 90\%\) and we compare the two fitted quantile fields.
## [PHYSICS-INFORMED QUANTILE REGRESSION]
# Set up the finite element function (order 1)
f <- fe_function(mesh, type = "P1")
# Proposed value for the smoothing parameter
lambda_grid = 10^seq(from = -7, to = -4, by = 0.2)
# Physics informed smoothing model
m <- qsr(log.rainfall ~ f, data = switzerland, level = 0.9,
penalty = fe_elliptic(K = K)
)
# Physics informed smoothing fit with grid search for GCV minimization
fit = m$fit(
calibrator = gcv(optimizer = grid_search(grid = lambda_grid),
seed = 425)
)We inspect the behavior of the GCV indices as a function of the values proposed for the smoothing parameter \(\lambda\,.\)
# GCV indices
gcv <- fit$values
# Optimal value selected for the smoothing parameter
lambda_opt_grid <- fit$optimum
lambda_opt_grid
#> [1] 2.511886e-06
# Plot of the GCV curve
par(family = "serif")
plot(x = log10(lambda_grid), y = gcv, type = "b",
lwd = 2, xlab = TeX("$\\log_{10}(\\lambda)$"), ylab = "GCV")
grid()
abline(v = log10(lambda_opt_grid), lty = 2, lwd = 2, col = "royalblue")
legend("bottomright", lty = 2, lwd = 2, col = "royalblue",
legend = TeX("$\\log_{10}(\\lambda_{grid})"))Figure 6: GCV curve for the \(90\%\) quantile field.
Also in this case, the GCV curve is convex with minimum realized at the optimal value selected by the method, specifically \(2.511886e-06\,.\).
We visualize the \(90\%\) quantile estimate computed above in both the logarithmic and original scales.
# Interactive plot
map_log90 <- mapview(f, crs = 4326, col.regions = mako,
na.color = "transparent", layer.name = "log.rainfall") +
mapview(boundary_sf, col.regions = "transparent", alpha.regions = 0,
col = "black", lwd = 1.5, layer.name = "domain", legend = FALSE)
map_90 <- mapview(exp(f), crs = 4326, col.regions = mako,
na.color = "transparent", layer.name = "rainfall") +
mapview(boundary_sf, col.regions = "transparent", alpha.regions = 0,
col = "black", lwd = 1.5, layer.name = "domain", legend = FALSE)
sync(map_log90, map_90)Figure 7: Rainfall \(90\%\) quantile
estimate in the logarithmic scale computed by the physics-informed
QSR-PDE with optimal lambda \(2.5e-06\) selected via GCV minimization
using grid search method (left); rainfall \(90\%\) quantile estimate in the original
scale computed by the physics-informed QSR-PDE with optimal
lambda \(2.5e-06\) selected via GCV
minimization using grid search method (right).
Comparing the two estimated quantile fields, we observe that the median surface appears considerably smoother than the \(90\%\) quantile. This is expected, as the median is more robustness to skewness and local outliers. On the other hand, the \(90\%\) quantile surface exhibits several pronounced local spikes, highlighting regions prone to extreme precipitation. These peaks suggest that certain areas experience significantly higher rainfall extreme events, highlighting the importance of analyzing the upper-tail quantiles when assessing rainfall risk.